-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
type-stable variable
#256
base: master
Are you sure you want to change the base?
type-stable variable
#256
Conversation
The failing tests appear related to adding Surprisingly, variable(ds, "temperature", Float32, ("lon",)) appears to pass the test, but fail when run in REPL REPL errorjulia> variable(ds, "temperature", Float32, ("lon",))
100-element NCDatasets.Variable{Float32, 1, NCDataset{Nothing, Missing}}
Error showing value of type NCDatasets.Variable{Float32, 1, NCDataset{Nothing, Missing}}:
ERROR: MethodError: Cannot `convert` an object of type
Tuple{Int64, Int64} to an object of type
Tuple{T} where T
Closest candidates are:
convert(::Type{T}, ::Tuple{Vararg{Any, N}}) where {N, T<:Tuple}
@ Base essentials.jl:457
convert(::Type{T}, ::T) where T<:Tuple
@ Base essentials.jl:456
convert(::Type{T}, ::T) where T
@ Base Base.jl:84
...
Stacktrace:
⋮ internal @ Base
[3] (Tuple{T} where T)(x::Tuple{Int64, Int64})
@ Base ./tuple.jl:386
[4] _chunking(v::NCDatasets.Variable{Float32, 1, NCDataset{Nothing, Missing}})
@ NCDatasets ~/Documents/CliMA/other-repos/NCDatasets.jl/src/variable.jl:306
[5] haschunks
@ ~/Documents/CliMA/other-repos/NCDatasets.jl/src/variable.jl:467 [inlined]
[6] _show(io::IOContext{Base.TTY}, mime::MIME{Symbol("text/plain")}, A::NCDatasets.Variable{Float32, 1, NCDataset{Nothing, Missing}})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/show.jl:15
[7] show(io::IOContext{Base.TTY}, mime::MIME{Symbol("text/plain")}, A::NCDatasets.Variable{Float32, 1, NCDataset{Nothing, Missing}})
@ NCDatasets ~/.julia/packages/DiskArrays/bZBJE/src/show.jl:5
⋮ internal @ OhMyREPL, REPL, Base.Multimedia, AbbreviatedStackTraces, REPL.LineEdit, Unknown
[24] (::REPL.var"#62#68"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-release-1-dot-10/usr/share/julia/stdlib/v1.10/REPL/src/REPL.jl:386
Use `err` to retrieve the full stack trace. edit: It appears the code successfully runs. but unsuccessfully prints to terminal. To see this, run e.g. variable(ds, "temperature", Float32, ("lon",)); # note the ";" however, attempting to index into the resulting variable always fails. |
As in Julia we have e.g. using NCDatasets
import NCDatasets: Variable, _jltype, nc_inq_varid, nc_inq_vardimid, nc_inq_vartype
function Variable{T,N}(ds::NCDataset,varname) where {T,N}
varid = nc_inq_varid(ds.ncid,varname)
dimids = nc_inq_vardimid(ds.ncid,varid)
nctype = _jltype(ds.ncid,nc_inq_vartype(ds.ncid,varid))
ndims = length(dimids)
# proper error message to add
@assert nctype == T
@assert ndims == N
# reverse dimids to have the dimension order in Fortran style
return Variable{T,N,typeof(ds)}(ds,varid, (reverse(dimids)...,))
end
fname = tempname()
ds = NCDataset(fname,"c")
defDim(ds,"lon",100)
defDim(ds,"lat",110)
v = defVar(ds,"temperature",Float32,("lon","lat"))
data = [Float32(i+j) for i = 1:100, j = 1:110];
v[:,:] = data;
close(ds)
ds = NCDataset(fname)
v = Variable{Float32,2}(ds,"temperature") It seems that the type-instability does not spread: function foo(ds)
v = Variable{Float32,2}(ds,"temperature")
v[:,:]
end
@code_warntype foo(ds)
The type check is still type unstable, but it will not spread. I think it is better to return an error for some inconsistent user input. May the JET test can only be run on julia 1.10? On julia 1.6, a relatively old version of JET is installed (JET v0.4.6 versus v0.8.29 on 1.10). |
I like the idea of For example, this should let you skip typecheck function Variable{T,N,checktypes}(ds::NCDataset,varname) where {T,N,checktypes}
varid = nc_inq_varid(ds.ncid,varname)
dimids = nc_inq_vardimid(ds.ncid,varid)
if checktypes
nctype = _jltype(ds.ncid,nc_inq_vartype(ds.ncid,varid))
ndims = length(dimids)
# proper error message to add
@assert nctype == T
@assert ndims == N
end
# reverse dimids to have the dimension order in Fortran style
return Variable{T,N,typeof(ds)}(ds,varid, (reverse(dimids)...,))
end
function Variable{T,N}(ds::NCDataset,varname) where {T,N}
Variable{T,N,true}(ds::NCDataset,varname)
end One note: There is still type instability even if you remove the assertions. This is due to the uninferred length of |
Do you have a real-word use case where checking the type and dimension would be a problem? There are already a quite a few internal checks in the netcdf c library. I am curious to know where these additional checks would be a problem. Note that also Array{Int,2} errors if you provide size e.g. (2,3,4).
Of course! :-) using NCDatasets
import NCDatasets: Variable, _jltype, ncType, nc_inq_varid, nc_inq_vardimid, nc_inq_vartype
function Variable{T,N}(ds::NCDataset,varname) where {T,N}
ncid = ds.ncid
varid = nc_inq_varid(ncid,varname)
dimids = nc_inq_vardimid(ncid,varid)
xtype = nc_inq_vartype(ncid,varid)
ndims = length(dimids)
if ncType[T] != xtype
error("Variable '$varname' has the type $(_jltype(ncid,xtype)) (instead of $T)")
end
if ndims != N
error("Variable '$varname' has $ndims dimension(s) (instead of $N)")
end
# reverse dimids to have the dimension order in Fortran style
dimeids_reversed = ntuple(N) do i
dimids[end-i+1]
end
return Variable{T,N,typeof(ds)}(ds,varid, dimeids_reversed)
end
fname = tempname()
ds = NCDataset(fname,"c")
defDim(ds,"lon",100)
defDim(ds,"lat",110)
v = defVar(ds,"temperature",Float32,("lon","lat"))
data = [Float32(i+j) for i = 1:100, j = 1:110];
v[:,:] = data;
close(ds)
ds = NCDataset(fname)
varname = "temperature"
v = Variable{Float32,2}(ds,varname)
# the only type-unstable code is in the error message to provide julia types (e.g. Float64 or Vector{Float64} rather than netCDF code like the value of NC_DOUBLE, NC_VLEN
@code_warntype Variable{Float32,2}(ds,varname)
using Test
@test_throws ErrorException Variable{Float32,3}(ds,varname)
@test_throws ErrorException Variable{Int32,2}(ds,varname) |
@dennisYatunin can you chip in here? |
Sure! It sounds like we're planning to read from around a dozen netcdf files on every update to our atmosphere model, and each of these unstable type checks seems to add a few kilobytes of allocations. After a several thousand timesteps (each of which includes multiple model evaluations on different Runge-Kutta stages), this will amount to nearly a gigabyte of allocations. For now, our model is still runs at around one timestep per second, so the garbage collector could easily keep up with these allocations. In the future, though, we will probably be able to run simple configurations much more quickly, and this garbage collection will take up a larger fraction of our runtime. We still have a fair number of allocations triggered by our atmosphere model as it runs, so this particular type instability is not a huge issue at the moment. However, our end goal is to have as few allocations and instabilities as possible, and it's good to tackle simple issues like this as they come up. That being said, we don't necessarily need the non-allocating version of this constructor to be part of the NCDatasets API. If this is considered too unsafe to be publicly exported, we could always just write our own version of it. |
Just to be clear, you are not instantiating a I use typically this approach: ds = NCDataset('file.nc')
ncvar = ds["your_variable"]
buffer = zeros(Float32,2,3,4)
for n = 1:nmax
step!(n,ncvar,buffer)
end where function step!(n,ncvar,buffer)
NCDatasets.load!(ncvar,buffer,:,n)
# some useful with buffer
end Are you doing something like this already? Or is this not possible? |
Currently, we do instantiate This is because I only keep the dataset open for the duration I read into memory, then close it, using code (roughly) on the format function read_data!(file, var, buffer, n)
NCDataset(file) do ds
NCDatasets.load!(variable(ds, var), buffer, :, n)
end
end I suppose in principle we could open the dataset at the start of the simulation, then create the variables we need to access at every time step, and store them in the cache (which is what you suggest above). opening the file "directly" ( A few comments/questions:
|
I would prefer to keep the discussion here a bit more focused. As I mentioned above, it is preferable to instantiate
As far as I know the number of open files is an OS setting, see here I would be surprised if you hit a limit with 500. Yes, I am not aware of a way to load the netcdf data directly into the GPU memory. Feel free to open a thread at the julia discourse forum and ping me there for a more general discussion. 😀 |
This is an attempt to write a type-stable variant of
variable
, which enables completely type stable reading of data from NetCDF files.Currently, reading data is not type stable since at read-time Julia doesn't know the type of the data to be read, nor the number of dimensions. See for example:
output
In some cases you have both these pieces of information. This PR extends
variable
with another signature on the formvariable(ds, varname, DType, dimnames)
whereds
andvarname
is as before. Now also supplied is the data typeDType
(e.g.Float32
) and dimension namesdimnames
(e.g.("lon", "lat")
or("time",)
).Supplying incorrect
dimnames
errors. However, correctdimnames
, but ordered incorrectly, does not error (and in fact gives "correct" results) unless you go out-of-bonds of the stored data (see tests for an example).More worryingly, providing incorrect
DType
gives garbage output with no warning. As such, it might be preferable to give this function a more scary name such asunsafe_variable
and not export it (likeload!
).example of incorrect loading
Other info
create dataset for above examples